De-anonymizing Genomic Databases Using Phenotypic Traits

نویسندگان

  • Mathias Humbert
  • Kévin Huguenin
  • Joachim Hugonot
  • Erman Ayday
  • Jean-Pierre Hubaux
چکیده

People increasingly have their genomes sequenced and some of them share their genomic data online. They do so for various purposes, including to find relatives and to help advance genomic research. An individual’s genome carries very sensitive, private information such as its owner’s susceptibility to diseases, which could be used for discrimination. Therefore, genomic databases are often anonymized. However, an individual’s genotype is also linked to visible phenotypic traits, such as eye or hair color, which can be used to re-identify users in anonymized public genomic databases, thus raising severe privacy issues. For instance, an adversary can identify a target’s genome using known her phenotypic traits and subsequently infer her susceptibility to Alzheimer’s disease. In this paper, we quantify, based on various phenotypic traits, the extent of this threat in several scenarios by implementing de-anonymization attacks on a genomic database of OpenSNP users sequenced by 23andMe. Our experimental results show that the proportion of correct matches reaches 23% with a supervised approach in a database of 50 participants. Our approach outperforms the baseline by a factor of four, in terms of the proportion of correct matches, in most scenarios. We also evaluate the adversary’s ability to predict individuals’ predisposition to Alzheimer’s disease, and we observe that the inference error can be halved compared to the baseline. We also analyze the effect of the number of known phenotypic traits on the success rate of the attack. As progress is made in genomic research, especially for genotype-phenotype associations, the threat presented in this paper will become more serious.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of genomic loci controlling phenologic and morphologic traits in barley (Hordeum vulgare L.) genotypes using association analysis

Association mapping is a technique with high resolution for QTL mapping based on linkage disequilibrium and has shown more promising for describing genetically complex traits. In addition, it is a powerful tool for describing complex agronomic traits and identifying alleles that can contribute to enhance the desired traits. In this study, whole genome association mapping was used in a set of 14...

متن کامل

اهمیت خویشاوندی ژنتیکی و رکورد فنوتیپی بر صحت ژنومی داده‌های جانهی شبیه‌ سازی شده با استفاده از مدل های حیوانی در حضور اثرات متقابل ژنوتیپ و محیط

The objective of this study was to investigate the role of genetic relationships between training and validation set with considering different ratio of phenotypic records of training set on accuracy of genomic prediction via animal models containing genotype × environment interactions in simulated imputation data. For this purpose, four different scenarios using 15k density containing differen...

متن کامل

Open Window: When Easily Identifiable Genomes and Traits Are in the Public Domain

"One can't be of an enquiring and experimental nature, and still be very sensible."--Charles Fort. As the costs of personal genetic testing "self-quantification" fall, publicly accessible databases housing people's genotypic and phenotypic information are gradually increasing in number and scope. The latest entrant is openSNP, which allows participants to upload their personal genetic/genomic a...

متن کامل

Identification of Genomic Regions Associated with Phenotypic Variation between Dog Breeds using Selection Mapping

The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs...

متن کامل

LAILAPS: The Plant Science Search Engine

With the number of sequenced plant genomes growing, the number of predicted genes and functional annotations is also increasing. The association between genes and phenotypic traits is currently of great interest. Unfortunately, the information available today is widely scattered over a number of different databases. Information retrieval (IR) has become an all-encompassing bioinformatics method...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PoPETs

دوره 2015  شماره 

صفحات  -

تاریخ انتشار 2015